Search CORE

arXiv.org e-Print Archive

Archive ouverte UNIGE

Non-alignment comparison of human and high primate genomes

Author: Alkan Can
Bailey Jeffrey A.
Eichler Evan E.
Green Eric D.
Liu Ge
Program NISC Comparative Sequencing
Sahinalp S. Cenk
Tuzun Eray
Zhao Shaying
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/03/2003
Field of study

Compositional spectra (CS) analysis based on k-mer scoring of DNA sequences was employed in this study for dot-plot comparison of human and primate genomes. The detection of extended conserved synteny regions was based on continuous fuzzy similarity rather than on chains of discrete anchors (genes or highly conserved noncoding elements). In addition to the high correspondence found in the comparisons of whole-genome sequences, a good similarity was also found after masking gene sequences, indicating that CS analysis manages to reveal phylogenetic signal in the organization of noncoding part of the genome sequences, including repetitive DNA and the genome "dark matter". Obviously, the possibility to reveal parallel ordering depends on the signal of common ancestor sequence organization varying locally along the corresponding segments of the compared genomes. We explored two sources contributing to this signal: sequence composition (GC content) and sequence organization (abundances of k-mers in the usual A,T,G,C or purine-pyrimidine alphabets). Whole-genome comparisons based on GC distribution along the analyzed sequences indeed gives reasonable results, but combining it with k-mer abundances dramatically improves the ordering quality, indicating that compositional and organizational heterogeneity comprise complementary sources of information on evolutionary conserved similarity of genome sequences

A custom capture sequence approach for oculocutaneous albinism identifies structural variant alleles at the OCA2 locus

Author: Adams David R
Baxter Laura L
Jackson Ian J
Loftus Stacie K
Lundh Linnea
Oetting William S
Pairo-Castineira Erola
Pavan William J
Program Nisc Comparative Sequencing
Watkins-Chow Dawn E
Publication venue: 'Wiley'
Publication date: 10/07/2021
Field of study

Oculocutaneous albinism (OCA) is a heritable disorder of pigment production that manifests as hypopigmentation and altered eye development. Exon sequencing of known OCA genes is unsuccessful in producing a complete molecular diagnosis for a significant number of affected individuals. We sequenced the DNA of individuals with OCA using short-read custom capture sequencing that targeted coding, intronic and non-coding regulatory regions of known OCA genes and GWAS-associated pigmentation loci. We identified an OCA2 complex structural variant (CxSV), defined by a 143kb inverted segment reintroduced in intron 1, upstream of the native location. The corresponding CxSV junctions were observed in 11/390 probands screened. The 143kb CxSV presents in one family as a copy number variant (CNV) duplication for the 143kb region. In the remaining 10/11 families, the 143kb CxSV acquired an additional 184kb deletion across the same region, restoring exons 3–19 of OCA2 to a copy-number neutral state. Allele-associated haplotype analysis found rare SNVs rs374519281 and rs139696407 are linked with the 143kb CxSV in both OCA2 alleles. For individuals in which customary molecular evaluation does not reveal a biallelic OCA diagnosis, we recommend preliminary screening for these haplotype-associated rare variants, followed by junction-specific validation for the OCA2 143kb CxSV

Edinburgh Research Explorer

Gene-Specific Substitution Profiles Describe the Types and Frequencies of Amino Acid Changes during Antibody Somatic Hypermutation

Author: Chaim A. Schramm
Chaim A. Schramm
Chaim A. Schramm
James C. Mullikin
John R. Mascola
Lawrence Shapiro
Lawrence Shapiro
Lawrence Shapiro
NISC Comparative Sequencing Program
Peter D. Kwong
Peter D. Kwong
Rui Kong
Zizhang Sheng
Zizhang Sheng
Publication venue: 'Frontiers Media SA'
Publication date: 01/05/2017
Field of study

Somatic hypermutation (SHM) plays a critical role in the maturation of antibodies, optimizing recognition initiated by recombination of V(D)J genes. Previous studies have shown that the propensity to mutate is modulated by the context of surrounding nucleotides and that SHM machinery generates biased substitutions. To investigate the intrinsic mutation frequency and substitution bias of SHMs at the amino acid level, we analyzed functional human antibody repertoires and developed mGSSP (method for gene-specific substitution profile), a method to construct amino acid substitution profiles from next-generation sequencing-determined B cell transcripts. We demonstrated that these gene-specific substitution profiles (GSSPs) are unique to each V gene and highly consistent between donors. We also showed that the GSSPs constructed from functional antibody repertoires are highly similar to those constructed from antibody sequences amplified from non-productively rearranged passenger alleles, which do not undergo functional selection. This suggests the types and frequencies, or mutational space, of a majority of amino acid changes sampled by the SHM machinery to be well captured by GSSPs. We further observed the rates of mutational exchange between some amino acids to be both asymmetric and context dependent and to correlate weakly with their biochemical properties. GSSPs provide an improved, position-dependent alternative to standard substitution matrices, and can be utilized to developing software for accurately modeling the SHM process. GSSPs can also be used for predicting the amino acid mutational space available for antigen-driven selection and for understanding factors modulating the maturation pathways of antibody lineages in a gene-specific context. The mGSSP method can be used to build, compare, and plot GSSPs1; we report the GSSPs constructed for 69 common human V genes (DOI: 10.6084/m9.figshare.3511083) and provide high-resolution logo plots for each (DOI: 10.6084/m9.figshare.3511085)

Columbia University Academic Commons

Recommended from our members

Developmental Pathway of the MPER-Directed HIV-1-Neutralizing Antibody 10E8

Author: Alam S. Munir
Connors Mark
Eudailey Joshua
Haynes Barton F.
Huang Jinghe
Joyce M. Gordon
Kwong Peter D.
Lloyd Krissey E.
Longo Nancy S.
Mascola John R.
McKee Krisha
Mullikin James C.
NISC Comparative Sequencing Program
Ofek Gilad
Parks Robert
Shapiro Lawrence S.
Soto Cinque
Yang Yongping
Zhang Baoshan
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2016
Field of study

Antibody 10E8 targets the membrane-proximal external region (MPER) of HIV-1 gp41, neutralizes >97% of HIV-1 isolates, and lacks the auto-reactivity often associated with MPER-directed antibodies. The developmental pathway of 10E8 might therefore serve as a promising template for vaccine design, but samples from time-of-infection—often used to infer the B cell record—are unavailable. In this study, we used crystallography, next-generation sequencing (NGS), and functional assessments to infer the 10E8 developmental pathway from a single time point. Mutational analysis indicated somatic hypermutation of the 2nd-heavy chain-complementarity determining region (CDR H2) to be critical for neutralization, and structures of 10E8 variants with V-gene regions reverted to genomic origin for heavy-and-light chains or heavy chain-only showed structural differences >2 Å relative to mature 10E8 in the CDR H2 and H3. To understand these developmental changes, we used bioinformatic sieving, maximum likelihood, and parsimony analyses of immunoglobulin transcripts to identify 10E8-lineage members, to infer the 10E8-unmutated common ancestor (UCA), and to calculate 10E8-developmental intermediates. We were assisted in this analysis by the preservation of a critical D-gene segment, which was unmutated in most 10E8-lineage sequences. UCA and early intermediates weakly bound a 26-residue-MPER peptide, whereas HIV-1 neutralization and epitope recognition in liposomes were only observed with late intermediates. Antibody 10E8 thus develops from a UCA with weak MPER affinity and substantial differences in CDR H2 and H3 from the mature 10E8; only after extensive somatic hypermutation do 10E8-lineage members gain recognition in the context of membrane and HIV-1 neutralization

Carolina Digital Repository

FigShare

Genetic effects on liver chromatin accessibility identify disease regulatory variants

Author: Albanus R.D.
Bonnycastle L.L.
Broadaway K.A.
Chaudhry A.S.
Collins F.S.
Currin K.W.
Didion J.P.
Erdos M.R.
Etheridge A.S.
Idol J.R.
Innocenti F.
Mohlke K.L.
Narisu N.
NISC Comparative Sequencing Program
Orchard P.
Parker S.C.J.
Perrin H.J.
Rai V.
Schuetz E.G.
Scott L.J.
Vadlamudi S.
Yan T.
Publication venue: Cell Press
Publication date: 01/01/2021
Field of study

Identifying the molecular mechanisms by which genome-wide association study (GWAS) loci influence traits remains challenging. Chromatin accessibility quantitative trait loci (caQTLs) help identify GWAS loci that may alter GWAS traits by modulating chromatin structure, but caQTLs have been identified in a limited set of human tissues. Here we mapped caQTLs in human liver tissue in 20 liver samples and identified 3,123 caQTLs. The caQTL variants are enriched in liver tissue promoter and enhancer states and frequently disrupt binding motifs of transcription factors expressed in liver. We predicted target genes for 861 caQTL peaks using proximity, chromatin interactions, correlation with promoter accessibility or gene expression, and colocalization with expression QTLs. Using GWAS signals for 19 liver function and/or cardiometabolic traits, we identified 110 colocalized caQTLs and GWAS signals, 56 of which contained a predicted caPeak target gene. At the LITAF LDL-cholesterol GWAS locus, we validated that a caQTL variant showed allelic differences in protein binding and transcriptional activity. These caQTLs contribute to the epigenomic characterization of human liver and help identify molecular mechanisms and genes at GWAS loci

Initial Sequence and Comparative Analysis of the Cat Genome

The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence

NSU Works

Revealing mammalian evolutionary relationships by comparative analysis of gene clusters

Author: Abi-Rached
Akahoshi
Bailey
Benjamin Dickins
Birney
Cadavid
Cathy Riemer
Chen
Chih-Hao Hsu
Chiu
Colobran
Datta
Degenhardt
Dewey
Dufayard
Edwards
Eric D. Green
Fitch
Fitch
Fitch
Giltae Song
Gish
Gonzalez
Goodstadt
Graef
Guethlein
Guethlein
Han
Hardies
Hardison
Hardison
Hardison
Harris
Hie Lim Kim
Hoffmann
Hou
Hou
Hsu
Hsu
Hu
Huerta-Cepas
Jensen
Johnson
Kim
Kristensen
Lee
Levy
Li
Li
Lopez-Vazquez
Louxin Zhang
Margulies
Martin
Matsuya
Mi
Miyata
Muller
Murphy
NISC Comparative Sequencing Program
Opazo
Opazo
Ostlund
Ouzounis
Parham
Pianezza
Rajalingam
Ross C. Hardison
Sambrook
Shilling
Siepel
Smit
Song
Song
Song
Sonnhammer
Su
Tatusov
The ENCODE Project Consortium
Uchiyama
van der Heijden
Vilella
Wang
Wapinski
Waterhouse
Webb Miller
Wilson
Wilson
Woelk
Yu Zhang
Zhang
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2012
Field of study

Many software tools for comparative analysis of genomic sequence data have been released in recent decades. Despite this, it remains challenging to determine evolutionary relationships in gene clusters due to their complex histories involving duplications, deletions, inversions, and conversions. One concept describing these relationships is orthology. Orthologs derive from a common ancestor by speciation, in contrast to paralogs, which derive from duplication. Discriminating orthologs from paralogs is a necessary step in most multispecies sequence analyses, but doing so accurately is impeded by the occurrence of gene conversion events. We propose a refined method of orthology assignment based on two paradigms for interpreting its definition: by genomic context or by sequence content. X-orthology (based on context) traces orthology resulting from speciation and duplication only, while N-orthology (based on content) includes the influence of conversion events

Nottingham Trent Institutional Repository (IRep)

ScholarBank@NUS

Insertion Sequence IS26 Reorganizes Plasmids in Clinically Isolated Multidrug-Resistant Bacteria by Replicative Transposition

Author: NISC Comparative Sequencing Program
NISC Comparative Sequencing Program Group
Publication venue: 'American Society for Microbiology'
Publication date
Field of study

Public Library of Science (PLOS)

Whole-Exome Sequencing Identifies Homozygous AFG3L2 Mutations in a Spastic Ataxia-Neuropathy Syndrome Linked to Mitochondrial m-AAA Proteases

We report an early onset spastic ataxia-neuropathy syndrome in two brothers of a consanguineous family characterized clinically by lower extremity spasticity, peripheral neuropathy, ptosis, oculomotor apraxia, dystonia, cerebellar atrophy, and progressive myoclonic epilepsy. Whole-exome sequencing identified a homozygous missense mutation (c.1847G>A; p.Y616C) in AFG3L2, encoding a subunit of an m-AAA protease. m-AAA proteases reside in the mitochondrial inner membrane and are responsible for removal of damaged or misfolded proteins and proteolytic activation of essential mitochondrial proteins. AFG3L2 forms either a homo-oligomeric isoenzyme or a hetero-oligomeric complex with paraplegin, a homologous protein mutated in hereditary spastic paraplegia type 7 (SPG7). Heterozygous loss-of-function mutations in AFG3L2 cause autosomal-dominant spinocerebellar ataxia type 28 (SCA28), a disorder whose phenotype is strikingly different from that of our patients. As defined in yeast complementation assays, the AFG3L2Y616C gene product is a hypomorphic variant that exhibited oligomerization defects in yeast as well as in patient fibroblasts. Specifically, the formation of AFG3L2Y616C complexes was impaired, both with itself and to a greater extent with paraplegin. This produced an early-onset clinical syndrome that combines the severe phenotypes of SPG7 and SCA28, in additional to other “mitochondrial” features such as oculomotor apraxia, extrapyramidal dysfunction, and myoclonic epilepsy. These findings expand the phenotype associated with AFG3L2 mutations and suggest that AFG3L2-related disease should be considered in the differential diagnosis of spastic ataxias

Kölner UniversitätsPublikationsServer